Streaming Balanced Graph Partitioning for Random Graphs
نویسنده
چکیده
There has been a recent explosion in the size of stored data, partially due to advances in storage technology, and partially due to the growing popularity of cloud-computing and the vast quantities of data generated. This motivates the need for streaming algorithms that can compute approximate solutions without full random access to all of the data. We model the problem of loading a graph onto a distributed cluster as computing an approximately balanced kpartitioning of a graph in a streaming fashion with only one pass over the data. We give lower bounds on this problem, showing that no algorithm can obtain an o(n) approximation with a random or adversarial stream ordering. We analyze two variants of a randomized greedy algorithm, one that prefers the arg max and one that is proportional, on random graphs with embedded balanced k-cuts and are able to theoretically bound the performance of each algorithms the arg max algorithm is able to recover the embedded k-cut, while, surprisingly, the proportional variant can not. This matches the experimental results in [25].
منابع مشابه
Streaming Balanced Graph Partitioning Algorithms for Random Graphs
With recent advances in storage technology, it is now possible to store the vast amounts of data generated by cloud computing applications. The sheer size of ‘big data’ motivates the need for streaming algorithms that can compute approximate solutions without full random access to all of the data. In this paper, we consider the problem of loading a graph onto a distributed cluster with the goal...
متن کاملLogGP: A Log-based Dynamic Graph Partitioning Method
With the increasing availability and scale of graph data from Web 2.0, graph partitioning becomes one of efficient preprocessing techniques to balance the computing workload. Since the cost of partitioning the entire graph is strictly prohibitive, there are some recent tentative works towards streaming graph partitioning which can run faster, be easily paralleled, and be incrementally updated. ...
متن کاملModeling, Analysis, and Experimental Comparison of Streaming Graph-Partitioning Policies: A Technical Report
In recent years, many distributed graph-processing systems have been designed and developed to analyze large-scale graphs. For all distributed graph-processing systems, partitioning graphs is a key part of processing and an important aspect of achieve good processing performance. To keep low the performance of partitioning graphs, even when processing the ever-increasing modern graphs, many pre...
متن کاملRemarks on Distance-Balanced Graphs
Distance-balanced graphs are introduced as graphs in which every edge uv has the following property: the number of vertices closer to u than to v is equal to the number of vertices closer to v than to u. Basic properties of these graphs are obtained. In this paper, we study the conditions under which some graph operations produce a distance-balanced graph.
متن کاملModeling, analysis, and experimental comparison of streaming graph-partitioning policies
In recent years, many distributed graph-processing systems have been designed and developed to analyze large-scale graphs. For all distributed graph-processing systems, partitioning graphs is a key part of processing and an important aspect to achieve good processing performance. To keep low the overhead of partitioning graphs, even when processing the ever-increasing modern graphs, many previo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1212.1121 شماره
صفحات -
تاریخ انتشار 2012